Overview

Differences from deterministic modeling

A stochastic model incorporates randomness; given the same inputs, a stochastic model may generate different outputs on different runs.

This is useful when the process you want to model is fundamentally stochastic, or if there is heterogeneity in its dynamics.

Unfortunately stochastic models can get complicated, and come with a lot of notation and formalism. Our goal here is to quickly review basic probability theory so that we can use stochastic models productively.

Random variables

A random variable is a variable whose value is random. In this course, random variables will have a real-world meaning, like:

  • \(X\) is the value of a fair coin toss (discrete, binary)
  • \(X\) is the number of new HIV infections diagnosed this week (discrete, count)
  • \(X\) is the number of people who had a drug overdose in the last 24 hours (discrete, count)
  • \(X\) is the survival time of a subject living with cancer (continuous, positive)

Probability

Probability \(\Pr(\cdot)\) is a function whose argument is the value a random variable can take on, which returns a positive number less than or equal to 1. The set of all such values describes the probability distribution.

For discrete random variables, we will write the probability mass function \(\Pr(X=x)\). For continuous random variables, we will use the density, defined below.

You can interpret probability statements in terms of frequency, e.g. How often is \(X=x\) in repeated trials?

Probability distributions and densities

The cumulative distribution funciton (CDF) is \(F(x) = \Pr(X < x)\).

The density of a continuous random varible is \(f(x)=F'(x)\), the derivative of CDF \(F(x)\) of \(X\), where it exists.

The density is like the probability mass function, but for continuous random variables.

Conditional probability

\(\Pr(Y=y|X=x)\) is the probability that \(Y=y\), given that \(X=x\). We interpret the statement on the right-hand side of the conditioning sign \(|\) as deterministic. Bayes’ theorem says how to compute conditional probabilities:

\[ \Pr(Y=y|X=x) = \frac{\Pr(Y=y,X=x)}{\Pr(X=x)} \]

Example: Let \(Y\) be HIV status and let \(X\) be age. \(\Pr(Y=1|X=x)\) is the HIV prevalence among people aged \(x\).

Expectation

The expected value of a random variable is its “average” value.

\[ E[X] = \sum_x x \Pr(X=x) \]

or

\[ E[X] = \int_{-\infty}^\infty x f(x) dx \]

where \(f(x)\) is the density of \(X\). The expectation is the average of values \(x\) that \(X\) can take on, weighted by their probability.

Can a random variable \(X\) have expectation equal to a value that \(X\) can never take? Yes. If \(X\sim \text{Bernoulli}(1/2)\), then \(\Pr(X=x)=1/2\) for \(x=1\) and \(x=0\). But \(E[X]=1/2\).

Conditional expectation

Conditional expectations are just expectations with respect to a conditional probability distribution, \[ E[Y|X=x] = \sum_y y\Pr(Y=y|X=x) \] \[ E[Y|X=x] = \int_{-\infty}^\infty y f(y|X=x) dy \]

Law of total probability

You can calculate “overall” or “marginal” probabilities by computing a weighted sum over conditional probabilities

\[ \Pr(X=x) = \sum \Pr(X=x|Y=y) \Pr(Y=y) \] \[ \Pr(X=x) = \int \Pr(X=x|Y=y) f(y) dy \]

The Law of total expectation is similar.

\[ E[X] = \sum E[X|Y=y] \Pr(Y=y) \] \[ E[X] = \int_{-\infty}^\infty E[X=x|Y=y] f(y) dy \]

Independence

Two random variables are independent if their joint probability factorizes into the product of their marginal probabilities, \[ \Pr(X=x,Y=y) = \Pr(X=x) \Pr(Y=y) \] or \(f(x,y) = f(x)f(y)\).

Discrete probability distributions

Bernoulli

This is the classic coin-flipping distribution, taking value 0 or 1.

The random variable \(X\) has Bernoulli\((p)\) distribution if \(\Pr(X=1)=p\) and \(\Pr(X=0)=1-p\).

Binomial

A Binomial\((n,p)\) random variable is a sum of \(n\) independent Bernoulli\((p)\) variables. The probability mass function is \[ \Pr(X=k) = \binom{n}{k} p^k (1-p)^{n-k} \] This is useful for modeling the number of events that occur in \(n\) trials, with fixed probability of success.

Poisson

A common probability model for count data is \[ \Pr(X = k) = \frac{(\lambda t)^k e^{-\lambda t}}{k!} \] This distribution looks weird, but it has a nice story, which we will see when we talk about Poisson processes.

Continuous probability distributions

Uniform

The uniform distribution is usually defined on \([0,1]\). \(X \sim \text{Unif}(0,1)\) means the density is \[ f(x) = 1 \] for \(0\le x \le 1\). The uniform distribution represents a non-informative (flat) distribution over its support.

Beta

If \(X\) has beta distribution, the density is \[ f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)} \] for \(0\le x \le 1\), where \(B(\alpha,\beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)\). This distribution is useful for modeling probabilities or proportions.

Exponential

If \(X\) has exponential distribution, the density is \[ f(x) = \lambda e^{-\lambda x} \] This distribution is useful for waiting times to an event. \(E[X] = 1/\lambda\).

Normal

The normal\((\mu,\sigma^2)\) density is \[ f(x) = \frac{1}{\sqrt{2\pi} \sigma} \exp\left[ -\frac{1}{2\sigma^2} (x-\mu)^2 \right] \] This is the classic “Gaussian” or “bell-shaped” distribution, often used to model random noise or error about a mean value.

Other probability distributions

There are lots of other “canonical” probability distributions

Discrete: geometric, negative binomial, beta-binomial

Continuous: Gamma, Log-Normal, Laplace, failure time distributions

We won’t use these in this class, but it’s important to know that there are lots of options for modeling random quantities. This is partly what the field of statistics is about.

Computing with probability trees

\[ \Pr(\text{Alive}|\text{Treatment A}) = 0.7 \] \[ \Pr(\text{Alive}|\text{Treatment B}) = 0.1 \] \[ \Pr(\text{Alive}) = 0.4 \times 0.7 + 0.6 \times 0.1 = 0.34 \]

Dynamic stochastic models

Markov models

In this course, we will focus on Markov stochastic models in discrete and continuous time. A Markov model is one whose next event depends only on its current state, and not on its prior history (Bailey (1990), Renshaw (2015))

Discrete-time Markov models

Let \(X(t)\) be a Markov process taking \(n\) different states, in discrete time, \(t=1,2,\ldots\). Define the matrix of one-step transition probabilities from each state \(i\) to each state \(j\),

\[ \mathbf{P} = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ p_{n1} & p_{n2} & \cdots & p_{nn} \end{pmatrix} \]

The rows of \(\mathbf{P}\) sum to one. The transition probability of moving from \(i\) to \(j\) in time \(t\) is

\[ P_{ij}(t) = \Pr(X(t) = j | X(0) = i) \]

Compute transition probabilities by multiplying \(\mathbf{P}\) by itself \(t\) times:

\[ P_{ij}(t) = [\mathbf{P}^t]_{ij} \]

Continuous-time Markov models

We described deterministic systems in terms of their rates earlier. But a continuous-time stochastic model produces different output each time we run it. How can we describe its rates of change when its path is different every time?

We need to describe the rate of change of its transition probabilities. Consider a continuous-time Markov chain \(X(t)\) on a discrete state space.

The transition rate between states \(i\) and \(j\) is \(\lambda_{ij}\) and define the rate matrix
\[ \mathbf{Q} = \begin{pmatrix} -\lambda_{11} & \lambda_{12} & \cdots & \lambda_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_{n1} & \lambda_{n2} & \cdots & -\lambda_{nn} \end{pmatrix} \]

where \(\lambda_{ii} = \sum_{j\neq i} \lambda_{ij}\). The transition probabilities \(P_{ij}(t) = \Pr(X(t)=j|X(0)=i)\) can be computed by \(\mathbf{P}(t) = \exp[\mathbf{Q} t]\).

Waiting times to jumps

Consider a continuous-time Markov model at time \(t\), where \(X(t)=i\). The rate of jumping to state \(j\) is \(\lambda_{ij}\). What does this mean?

It means that the jump to \(j\) occurs after a waiting time with distribution Exponential\((\lambda_{ij})\). If there are many possibilities \(k\) with \(\lambda_{ik}>0\), then the waiting time to the next jump to any state is the minimum of all these exponential waiting times. Fortunately,

\[ \min\{W_{ij}:\ j\neq i\} \sim \text{Exponential}\left(\sum_{j\neq i} \lambda_{ik}\right) \]

Furthermore, the destination of the next jump is independent of this waiting time.

Properties of continuous-time Markov models

Several properties of CTMCs are useful:

  1. Markov property
  2. Exponential waiting times to the next jump
  3. Independence of waiting time and destination
  4. Easy simulation

Drawbacks:

  1. Difficult to compute moments
  2. Difficult to compute transition probabilities
  3. Difficult to estimate parameters from discretely observed data
  4. Infinite state spaces require special methods

Poisson process

Now consider a continuous-time Markov chain whose only positive transition rates are to the next integer state, \(\lambda_{i,i+1}=\lambda\). This is called a counting process. Let \(X(t)=i\), so the waiting time to the jump to \(i+1\) is

\[ W_i \sim \text{Exponential}(\lambda) \]

following the last event. Given \(X(0)=0\), what is the distribution of the number of jumps \(X(t)\)?

\[ X(t) \sim \text{Poisson}(\lambda t) \]

So \(E[X(t)] = \lambda t\). This is where the Poisson distribution comes from.

Queues

A queue is model for a list of individuals waiting to be served. Examples:

  • Emergency room beds
  • Vaccine inventory
  • HIV care pipeline

The simplest queue is a continuous-time Markov model with \(\lambda_{i,i+1}=\lambda\) and \(\lambda_{i,i-1}=\mu\).

A slightly more complicated queue has \(\lambda_{i,i+1}=\lambda\) and \(\lambda_{i,i-1}=i\mu\).

Queues are useful because they can be combined into a stochastic model for sequential movement through a series of compartments or states.

Example of a queueing model

Gonsalves et al. (2017)

Simulating continuous-time Markov models

Gillespie algorithm

The “Gillespie algorithm” performs forward simulation of a stochastic compartmental model (Pineda-Krch (2008)).

You don’t really need to know how it works. Forward simulation of continuous-time Markov models is computationally straightforward.

The ssa function in the GillespieSSA package implements a fast version of the Gillespie algorithm.

Example: SIR

library(GillespieSSA)
parms <- c(beta = 0.005, gamma = 0.1)
x0 <- c(S = 499, I = 1, R = 0)
a <- c("beta*S*I", "gamma*I")
nu <- matrix(c(-1, 0, +1, -1, 0, +1), nrow = 3, 
    byrow = TRUE)
out <- ssa(x0, a, nu, parms, tf = 100, simName = "SIR model")

Example: SIR

plot(out$data[, 1], out$data[, 2], col = "green", 
    ylim = c(0, 500), type = "l", xlab = "Time", 
    ylab = "Number", bty = "n")
lines(out$data[, 1], out$data[, 3], col = "red")
lines(out$data[, 1], out$data[, 4], col = "blue")
legend(50, 500, c("S", "I", "R"), pch = 16, 
    col = c("green", "red", "blue"))

Explanation of ssa

ssa(x0, a, nu, parms, tf)

Here, x0 is the starting state, a is the transition rate, nu is the change in the compartment, parms is the parameters, and tf is the final time.

The function returns a data frame with jump times and states.

head(out$data)
##                        S I R
## timeSeries 0.0000000 499 1 0
##            0.8529404 499 0 1
##            0.8529404 499 0 1

“Statistical” models

“Statistical models” are stochastic because they contain random variables. Consider the usual Normal/Gaussian linear model \[ y = \alpha + \beta x + \epsilon \] where \(\epsilon \sim N(0,\sigma^2)\). This model is stochastic because \(\epsilon\) is random.

In my opinion, you should think about so-called statistical models as mechanisitic.

References

Bailey, Norman T. 1990. The Elements of Stochastic Processes with Applications to the Natural Sciences. Vol. 25. John Wiley & Sons.

Gonsalves, Gregg S, A David Paltiel, Paul D Cleary, Michael J Gill, Mari M Kitahata, Peter F Rebeiro, Michael J Silverberg, et al. 2017. “A Flow-Based Model of the HIV Care Continuum in the United States.” Journal of Acquired Immune Deficiency Syndromes 75 (5): 548–53.

Pineda-Krch, Mario. 2008. “GillespieSSA: Implementing the Stochastic Simulation Algorithm in R.” Journal of Statistical Software 25 (12): 1–18.

Renshaw, Eric. 2015. Stochastic Population Processes: Analysis, Approximations, Simulations. Oxford University Press.